Search CORE

9 research outputs found

Multimodal Analogical Reasoning over Knowledge Graphs

Author: Chen Huajun
Chen Xiang
Deng Shumin
Li Lei
Liang Xiaozhuan
Zhang Ningyu
Publication venue
Publication date: 25/01/2023
Field of study

Analogical reasoning is fundamental to human cognition and holds an important place in various fields. However, previous studies mainly focus on single-modal analogical reasoning and ignore taking advantage of structure knowledge. Notably, the research in cognitive psychology has demonstrated that information from multimodal sources always brings more powerful cognitive transfer than single modality sources. To this end, we introduce the new task of multimodal analogical reasoning over knowledge graphs, which requires multimodal reasoning ability with the help of background knowledge. Specifically, we construct a Multimodal Analogical Reasoning dataSet (MARS) and a multimodal knowledge graph MarKG. We evaluate with multimodal knowledge graph embedding and pre-trained Transformer baselines, illustrating the potential challenges of the proposed task. We further propose a novel model-agnostic Multimodal analogical reasoning framework with Transformer (MarT) motivated by the structure mapping theory, which can obtain better performance. Code and datasets are available in https://github.com/zjunlp/MKG_Analogy.Comment: Accepted by ICLR 202

arXiv.org e-Print Archive

Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models

Author: Chen Huajun
Chen Zhuo
Fan Xiaohui
Fang Yin
Huang Rui
Liang Xiaozhuan
Liu Kangwei
Zhang Ningyu
Publication venue
Publication date: 29/08/2023
Field of study

Large Language Models (LLMs), with their remarkable task-handling capabilities and innovative outputs, have catalyzed significant advancements across a spectrum of fields. However, their proficiency within specialized domains such as biomolecular studies remains limited. To address this challenge, we introduce Mol-Instructions, a meticulously curated, comprehensive instruction dataset expressly designed for the biomolecular realm. Mol-Instructions is composed of three pivotal components: molecule-oriented instructions, protein-oriented instructions, and biomolecular text instructions, each curated to enhance the understanding and prediction capabilities of LLMs concerning biomolecular features and behaviors. Through extensive instruction tuning experiments on the representative LLM, we underscore the potency of Mol-Instructions to enhance the adaptability and cognitive acuity of large models within the complex sphere of biomolecular studies, thereby promoting advancements in the biomolecular research community. Mol-Instructions is made publicly accessible for future research endeavors and will be subjected to continual updates for enhanced applicability.Comment: Project homepage: https://github.com/zjunlp/Mol-Instructions. Add quantitative evaluation

arXiv.org e-Print Archive

Contrastive Demonstration Tuning for Pre-trained Language Models

Author: Bi Zhen
Chen Huajun
Cheng Siyuan
Huang Fei
Huang Songfang
Liang Xiaozhuan
Tan Chuanqi
Zhang Ningyu
Zhang Zhenru
Publication venue
Publication date: 18/04/2022
Field of study

Pretrained language models can be effectively stimulated by textual prompts or demonstrations, especially in low-data scenarios. Recent works have focused on automatically searching discrete or continuous prompts or optimized verbalizers, yet studies for the demonstration are still limited. Concretely, the demonstration examples are crucial for an excellent final performance of prompt-tuning. In this paper, we propose a novel pluggable, extensible, and efficient approach named contrastive demonstration tuning, which is free of demonstration sampling. Furthermore, the proposed approach can be: (i) Plugged to any previous prompt-tuning approaches; (ii) Extended to widespread classification tasks with a large number of categories. Experimental results on 16 datasets illustrate that our method integrated with previous approaches LM-BFF and P-tuning can yield better performance. Code is available in https://github.com/zjunlp/PromptKG/tree/main/research/Demo-Tuning.Comment: Work in progres

arXiv.org e-Print Archive

Relphormer: Relational Graph Transformer for Knowledge Graph Representations

Author: Bi Zhen
Chen Huajun
Chen Jing
Chen Qiang
Cheng Siyuan
Guo Wei
Liang Xiaozhuan
Xiong Feiyu
Zhang Ningyu
Publication venue
Publication date: 14/03/2023
Field of study

Transformers have achieved remarkable performance in widespread fields, including natural language processing, computer vision and graph mining. However, vanilla Transformer architectures have not yielded promising improvements in the Knowledge Graph (KG) representations, where the translational distance paradigm dominates this area. Note that vanilla Transformer architectures struggle to capture the intrinsically heterogeneous structural and semantic information of knowledge graphs. To this end, we propose a new variant of Transformer for knowledge graph representations dubbed Relphormer. Specifically, we introduce Triple2Seq which can dynamically sample contextualized sub-graph sequences as the input to alleviate the heterogeneity issue. We propose a novel structure-enhanced self-attention mechanism to encode the relational information and keep the semantic information within entities and relations. Moreover, we utilize masked knowledge modeling for general knowledge graph representation learning, which can be applied to various KG-based tasks including knowledge graph completion, question answering, and recommendation. Experimental results on six datasets show that Relphormer can obtain better performance compared with baselines. Code is available in https://github.com/zjunlp/Relphormer.Comment: Work in progres

arXiv.org e-Print Archive

CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark

Author: Bi Zhen
Chang Baobao
Chen Mosha
Chen Qingcai
Huang Fei
Li Lei
Li Linfeng
Liang Xiaozhuan
Ni Yuan
Shang Xin
Si Luo
Sui Zhifang
Tan Chuanqi
Tang Buzhou
Xie Guotong
Xu Jian
Yan Jun
Yin Kangping
Yuan Zheng
Zan Hongying
Zhang Kunli
Zhang Ningyu
Zong Hui
Publication venue
Publication date: 06/07/2021
Field of study

Artificial Intelligence (AI), along with the recent progress in biomedical language understanding, is gradually changing medical practice. With the development of biomedical language understanding benchmarks, AI applications are widely used in the medical field. However, most benchmarks are limited to English, which makes it challenging to replicate many of the successes in English for other languages. To facilitate research in this direction, we collect real-world biomedical data and present the first Chinese Biomedical Language Understanding Evaluation (CBLUE) benchmark: a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification, and an associated online platform for model evaluation, comparison, and analysis. To establish evaluation on these tasks, we report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform by far worse than the human ceiling. Our benchmark is released at \url{https://tianchi.aliyun.com/dataset/dataDetail?dataId=95414&lang=en-us}

arXiv.org e-Print Archive

Multi-Modal Protein Knowledge Graph Construction and Applications (Student Abstract)

Author: Bi Zhen
Chen Huajun
Cheng Siyuan
Liang Xiaozhuan
Zhang Ningyu
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 06/09/2023
Field of study

Existing data-centric methods for protein science generally cannot sufficiently capture and leverage biology knowledge, which may be crucial for many protein tasks. To facilitate research in this field, we create ProteinKG65, a knowledge graph for protein science. Using gene ontology and Uniprot knowledge base as a basis, we transform and integrate various kinds of knowledge with aligned descriptions and protein sequences, respectively, to GO terms and protein entities. ProteinKG65 is mainly dedicated to providing a specialized protein knowledge graph, bringing the knowledge of Gene Ontology to protein function and structure prediction. We also illustrate the potential applications of ProteinKG65 with a prototype. Our dataset can be downloaded at https://w3id.org/proteinkg65

Association for the Advancement of Artificial Intelligence: AAAI Publications

OntoProtein: Protein Pretraining With Gene Ontology Embedding

Author: Bi Zhen
Chen Huajun
Cheng Siyuan
Deng Shumin
Hong Haosen
Lian Jiazhang
Liang Xiaozhuan
Zhang Ningyu
Zhang Qiang
Publication venue
Publication date: 03/06/2022
Field of study

Self-supervised protein language models have proved their effectiveness in learning the proteins representations. With the increasing computational power, current protein language models pre-trained with millions of diverse sequences can advance the parameter scale from million-level to billion-level and achieve remarkable improvement. However, those prevailing approaches rarely consider incorporating knowledge graphs (KGs), which can provide rich structured knowledge facts for better protein representations. We argue that informative biology knowledge in KGs can enhance protein representation with external knowledge. In this work, we propose OntoProtein, the first general framework that makes use of structure in GO (Gene Ontology) into protein pre-training models. We construct a novel large-scale knowledge graph that consists of GO and its related proteins, and gene annotation texts or protein sequences describe all nodes in the graph. We propose novel contrastive learning with knowledge-aware negative sampling to jointly optimize the knowledge graph and protein embedding during pre-training. Experimental results show that OntoProtein can surpass state-of-the-art methods with pre-trained protein language models in TAPE benchmark and yield better performance compared with baselines in protein-protein interaction and protein function prediction. Code and datasets are available in https://github.com/zjunlp/OntoProtein.Comment: Accepted by ICLR 202

arXiv.org e-Print Archive

DeepKE: A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population

Author: Chen Huajun
Chen Qiang
Chen Xiang
Deng Shumin
Huang Fei
Li Lei
Li Zhoubo
Liang Xiaozhuan
Qiao Shuofei
Tan Chuanqi
Tao Liankuan
Wang Peng
Xie Xin
Xiong Feiyu
Xu Xin
Yao Yunzhi
Ye Hongbin
Yu Haiyang
Zhang Ningyu
Zhang Wen
Zhang Zhenru
Zheng Guozhou
Publication venue
Publication date: 02/08/2022
Field of study

We present an open-source and extensible knowledge extraction toolkit DeepKE, supporting complicated low-resource, document-level and multimodal scenarios in the knowledge base population. DeepKE implements various information extraction tasks, including named entity recognition, relation extraction and attribute extraction. With a unified framework, DeepKE allows developers and researchers to customize datasets and models to extract information from unstructured data according to their requirements. Specifically, DeepKE not only provides various functional modules and model implementation for different tasks and scenarios but also organizes all components by consistent frameworks to maintain sufficient modularity and extensibility. We release the source code at GitHub in https://github.com/zjunlp/DeepKE with Google Colab tutorials and comprehensive documents for beginners. Besides, we present an online system in http://deepke.openkg.cn/EN/re_doc_show.html for real-time extraction of various tasks, and a demo video.Comment: Work in progress and the project website is http://deepke.zjukg.cn

arXiv.org e-Print Archive

PlncRNADB: A Repository of Plant lncRNAs and lncRNA-RBP Protein Interactions

Author: Agostini F.
Andreu Paytuví Gallart
Bai Y.
Bardou F.
Bellucci M.
Buels R.
Burge S.W.
Bussotti G.
Campalans A.
Chen G.
Cirillo D.
Doroshenk K.A.
Gan X.
Gomez J.A.
Goodstein D.M.
Gupta R.A.
Guttman M.
Hao Y
Jin J.
Kin T.
Kurihara Y.
Li L.
Liao Q.
Liu J.
Luo C.
Ma X.
Ming Chen
Morey C.
Nagano T.
Ono K.
Pang K.C.
Peijing Zhang
Ponting C.P.
Quek X.C.
Rinn J.L.
Shuai P.
Siliang Liang
Szczesniak M.W.
Tiantian Ye
Trapnell C.
Wang H.
Wang J.
Wang K.C.
Wang K.C.
Wang L.
Wu H.J.
Xiaonan Gong
Xiaozhuan Dai
Xie C.
Xu Yan
Yang Y.W.
Youhuang Bai
Zhu Q.H.
Publication venue: 'Bentham Science Publishers Ltd.'
Publication date
Field of study

Crossref